137 research outputs found

    Joint mapping of genes and conditions via multidimensional unfolding analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray compendia profile the expression of genes in a number of experimental conditions. Such data compendia are useful not only to group genes and conditions based on their similarity in overall expression over profiles but also to gain information on more subtle relations between genes and conditions. Getting a clear visual overview of all these patterns in a single easy-to-grasp representation is a useful preliminary analysis step: We propose to use for this purpose an advanced exploratory method, called multidimensional unfolding.</p> <p>Results</p> <p>We present a novel algorithm for multidimensional unfolding that overcomes both general problems and problems that are specific for the analysis of gene expression data sets. Applying the algorithm to two publicly available microarray compendia illustrates its power as a tool for exploratory data analysis: The unfolding analysis of a first data set resulted in a two-dimensional representation which clearly reveals temporal regulation patterns for the genes and a meaningful structure for the time points, while the analysis of a second data set showed the algorithm's ability to go beyond a mere identification of those genes that discriminate between different patient or tissue types.</p> <p>Conclusion</p> <p>Multidimensional unfolding offers a useful tool for preliminary explorations of microarray data: By relying on an easy-to-grasp low-dimensional geometric framework, relations among genes, among conditions and between genes and conditions are simultaneously represented in an accessible way which may reveal interesting patterns in the data. An additional advantage of the method is that it can be applied to the raw data without necessitating the choice of suitable genewise transformations of the data.</p

    Evaluation of time profile reconstruction from complex two-color microarray designs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As an alternative to the frequently used "reference design" for two-channel microarrays, other designs have been proposed. These designs have been shown to be more profitable from a theoretical point of view (more replicates of the conditions of interest for the same number of arrays). However, the interpretation of the measurements is less straightforward and a reconstruction method is needed to convert the observed ratios into the genuine profile of interest (e.g. a time profile). The potential advantages of using these alternative designs thus largely depend on the success of the profile reconstruction. Therefore, we compared to what extent different linear models agree with each other in reconstructing expression ratios and corresponding time profiles from a complex design.</p> <p>Results</p> <p>On average the correlation between the estimated ratios was high, and all methods agreed with each other in predicting the same profile, especially for genes of which the expression profile showed a large variance across the different time points. Assessing the similarity in profile shape, it appears that, the more similar the underlying principles of the methods (model and input data), the more similar their results. Methods with a dye effect seemed more robust against array failure. The influence of a different normalization was not drastic and independent of the method used.</p> <p>Conclusion</p> <p>Including a dye effect such as in the methods lmbr_dye, anovaFix and anovaMix compensates for residual dye related inconsistencies in the data and renders the results more robust against array failure. Including random effects requires more parameters to be estimated and is only advised when a design is used with a sufficient number of replicates. Because of this, we believe lmbr_dye, anovaFix and anovaMix are most appropriate for practical use.</p

    Meta Analysis of Gene Expression Data within and Across Species

    Get PDF
    Since the second half of the 1990s, a large number of genome-wide analyses have been described that study gene expression at the transcript level. To this end, two major strategies have been adopted, a first one relying on hybridization techniques such as microarrays, and a second one based on sequencing techniques such as serial analysis of gene expression (SAGE), cDNA-AFLP, and analysis based on expressed sequence tags (ESTs). Despite both types of profiling experiments becoming routine techniques in many research groups, their application remains costly and laborious. As a result, the number of conditions profiled in individual studies is still relatively small and usually varies from only two to few hundreds of samples for the largest experiments. More and more, scientific journals require the deposit of these high throughput experiments in public databases upon publication. Mining the information present in these databases offers molecular biologists the possibility to view their own small-scale analysis in the light of what is already available. However, so far, the richness of the public information remains largely unexploited. Several obstacles such as the correct association between ESTs and microarray probes with the corresponding gene transcript, the incompleteness and inconsistency in the annotation of experimental conditions, and the lack of standardized experimental protocols to generate gene expression data, all impede the successful mining of these data. Here, we review the potential and difficulties of combining publicly available expression data from respectively EST analyses and microarray experiments. With examples from literature, we show how meta-analysis of expression profiling experiments can be used to study expression behavior in a single organism or between organisms, across a wide range of experimental conditions. We also provide an overview of the methods and tools that can aid molecular biologists in exploiting these public data

    COLOMBOS v2.0 : an ever expanding collection of bacterial expression compendia

    Get PDF
    The COLOMBOS database (http://www.colombos.net) features comprehensive organism-specific cross-platform gene expression compendia of several bacterial model organisms and is supported by a fully interactive web portal and an extensive web API. COLOMBOS was originally published in PLoS One, and COLOMBOS v2.0 includes both an update of the expression data, by expanding the previously available compendia and by adding compendia for several new species, and an update of the surrounding functionality, with improved search and visualization options and novel tools for programmatic access to the database. The scope of the database has also been extended to incorporate RNA-seq data in our compendia by a dedicated analysis pipeline. We demonstrate the validity and robustness of this approach by comparing the same RNA samples measured in parallel using both microarrays and RNA-seq. As far as we know, COLOMBOS currently hosts the largest homogenized gene expression compendia available for seven bacterial model organisms

    Genome-wide detection of predicted non-coding RNAs in Rhizobium etli expressed during free-living and host-associated growth using a high-resolution tiling array

    Get PDF
    Non-coding RNAs (ncRNAs) play a crucial role in the intricate regulation of bacterial gene expression, allowing bacteria to quickly adapt to changing environments. In the past few years, a growing number of regulatory RNA elements have been predicted by computational methods, mostly in well-studied gamma-proteobacteria but lately in several alpha-proteobacteria as well. Here, we have compared an extensive compilation of these non-coding RNA predictions to intergenic expression data of a whole-genome high-resolution tiling array in the soil-dwelling alpha-proteobacterium Rhizobium etli.Journal ArticleResearch Support, Non-U.S. Gov'tinfo:eu-repo/semantics/publishe

    A COMPASS for VESPUCCI: a FAIR way to explore the grapevine transcriptomic landscape

    Get PDF
    7openInternational coauthor/editoropenMoretto, M.; Sonego, P.; Pilati, S.; Matus, J.T.; Costantini, L.; Malacarne, G.; Engelen, K.Moretto, M.; Sonego, P.; Pilati, S.; Matus, J.T.; Costantini, L.; Malacarne, G.; Engelen, K

    COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses

    Get PDF
    open13siCOLOMBOS is a database that integrates publicly available transcriptomics data for several prokaryotic model organisms. Compared to the previous version it has more than doubled in size, both in terms of species and data available. The manually curated condition annotation has been overhauled as well, giving more complete information about samples' experimental conditions and their differences. Functionality-wise cross-species analyses now enable users to analyse expression data for all species simultaneously, and identify candidate genes with evolutionary conserved expression behaviour. All the expression-based query tools have undergone a substantial improvement, overcoming the limit of enforced co-expression data retrieval and instead enabling the return of more complex patterns of expression behaviour. COLOMBOS is freely available through a web application at http://colombos.net/. The complete database is also accessible via REST API or downloadable as tab-delimited text files.openMoretto, Marco; Sonego, Paolo; Dierckxsens, Nicolas; Brilli, Matteo; Bianco, Luca; Ledezma-Tejeida, Daniela; Gama-Castro, Socorro; Galardini, Marco; Romualdi, Chiara; Laukens, Kris; Collado-Vides, Julio; Meysman, Pieter; Engelen, KristofMoretto, Marco; Sonego, Paolo; Dierckxsens, Nicolas; Brilli, Matteo; Bianco, Luca; Ledezma Tejeida, Daniela; Gama Castro, Socorro; Galardini, Marco; Romualdi, Chiara; Laukens, Kris; Collado Vides, Julio; Meysman, Pieter; Engelen, Kristo

    Stress response regulators identified through genome-wide transcriptome analysis of the (p)ppGpp-dependent response in Rhizobium etli

    Get PDF
    Background: The alarmone (p) ppGpp mediates a global reprogramming of gene expression upon nutrient limitation and other stresses to cope with these unfavorable conditions. Synthesis of (p) ppGpp is, in most bacteria, controlled by RelA/SpoT (Rsh) proteins. The role of (p) ppGpp has been characterized primarily in Escherichia coli and several Gram-positive bacteria. Here, we report the first in-depth analysis of the (p) ppGpp-regulon in an alpha-proteobacterium using a high-resolution tiling array to better understand the pleiotropic stress phenotype of a relA/rsh mutant. Results: We compared gene expression of the Rhizobium etli wild type and rsh (previously rel) mutant during exponential and stationary phase, identifying numerous (p) ppGpp targets, including small non-coding RNAs. The majority of the 834 (p) ppGpp-dependent genes were detected during stationary phase. Unexpectedly, 223 genes were expressed (p) ppGpp-dependently during early exponential phase, indicating the hitherto unrecognized importance of (p) ppGpp during active growth. Furthermore, we identified two (p) ppGpp-dependent key regulators for survival during heat and oxidative stress and one regulator putatively involved in metabolic adaptation, namely extracytoplasmic function sigma factor EcfG2/PF00052, transcription factor CH00371, and serine protein kinase PrkA. Conclusions: The regulatory role of (p) ppGpp in R. etli stress adaptation is far-reaching in redirecting gene expression during all growth phases. Genome-wide transcriptome analysis of a strain deficient in a global regulator, and exhibiting a pleiotropic phenotype, enables the identification of more specific regulators that control genes associated with a subset of stress phenotypes. This work is an important step toward a full understanding of the regulatory network underlying stress responses in alpha-proteobacteria

    Inferring transcriptional modules from ChIP-chip, motif and microarray data

    Get PDF
    'ReMoDiscovery' is an intuitive algorithm to correlate regulatory programs with regulators and corresponding motifs to a set of co-expressed genes. It exploits in a concurrent way three independent data sources: ChIP-chip data, motif information and gene expression profiles. When compared to published module discovery algorithms, ReMoDiscovery is fast and easily tunable. We evaluated our method on yeast data, where it was shown to generate biologically meaningful findings and allowed the prediction of potential novel roles of transcriptional regulators
    • …
    corecore